Statistical MT Systems Revisited: How much Hybridity do they have?

نویسنده

  • Hermann Ney
چکیده

The statistical approach to MT started about twenty-five years ago and has now been widely accepted as an alternative to the classical approach with manually designed rules. Among the attractive properties of the statistical approach is its capability to learn the translation models automatically from a (sufficiently) large amount of sourcetarget sentence pairs. Thus the need for the manual design of suitable rules and for human interaction can be reduced dramatically when developing an MT system for a new application or language pair. The idea of hybrid MT is to combine the advantages of both the rule-based and statistical approaches. In practice, most statistical MT systems make use of manually designed rules in order to improve the MT accuracy. We revisit the RWTH systems in order to study the effect of typical preprocessing steps based on manually designed rules. The RWTH systems cover various tasks (e.g. news, patents, lectures) and various languages (e.g. Arabic, Chinese, English, Japanese). The preprocessing steps may include a categorization of numbers, date and time expressions, a word decomposition based on morphological analysis and explicit word re-ordering based on a syntactic analysis. In general, the preprocessing steps may depend heavily on the language pair under consideration. We will also address concepts that aim at a tighter integration of the conventional rule-based and the statistical approaches. We will consider the implications of such a tight integration for the architecture of an MT system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybridity of Scientific Discourses: an Intertextual Perspective and Implications for ESP Pedagogy

In light of a large number of admirable attempts which look at scientific discourse from social, dialogic and interpersonal points of view, the propositions which consider scientific discourse as an interactive endeavor are now well-established. By the force of our social constructivist gyrations, we have developed glimpses of a social, cultural and historical dimension in which the discourse o...

متن کامل

On inferring hybridity from morphological intermediacy

ing hybridity, analyses should distinguish between the two types of intermediacy. (1) Hybrid indices fail to do so. (2) Principal components analysis does so only in an ambiguous way. (3) Pictorialized scatter diagrams properly present the evidence for an interpretation that is intuitive. (4) Counting characters as intermediate or not-intermediate is an explicit approach that allows for statist...

متن کامل

سیستمهای اطلاعاتی در مدیریت

Information systems and information technology discuss topics in organizations around the world. The enterprises in which they are discussed vary in size and industry type, yet there is a common recognition that the manner in which these systems are used will not only influence the success of the organization but can change the world in which they function. The challenge to managment of organi...

متن کامل

Optimists and Skeptics

For building resources and performing MT: Do automated techniques deliver? This (possibly controversial) panel will examine where the recently developed techniques of automated (statistical and other) methods for MT and Computational Linguistics are leading. In addition to performing some subtasks of MT, these techniques have proven rather useful for building resources. But, ultimately, the que...

متن کامل

A Critique of Statistical Machine Translation

Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013